AITopics | human-object pair

Zero-shot Human-Object Interaction (HOI) detection aims to identify both seen and unseen HOI categories. A strong zero-shot HOI detector is supposed to be not only capable of discriminating novel interactions but also robust to positional distribution discrepancy between seen and unseen categories when locating human-object pairs. However, top-performing zero-shot HOI detectors rely on seen and predefined unseen categories to distill knowledge from CLIP and jointly locate human-object pairs without considering the potential positional distribution discrepancy, leading to impaired transferability. In this paper, we introduce CLIP4HOI, a novel framework for zero-shot HOI detection. CLIP4HOI is developed on the vision-language model CLIP and ameliorates the above issues in the following two aspects. First, to avoid the model from overfitting to the joint positional distribution of seen human-object pairs, we seek to tackle the problem of zero-shot HOI detection in a disentangled two-stage paradigm. To be specific, humans and objects are independently identified and all feasible human-object pairs are processed by Human-Object interactor for pairwise proposal generation. Second, to facilitate better transferability, the CLIP model is elaborately adapted into a fine-grained HOI classifier for proposal discrimination, avoiding data-sensitive knowledge distillation. Finally, experiments on prevalent benchmarks show that our CLIP4HOI outperforms previous approaches on both rare and unseen categories, and sets a series of state-of-the-art records under a variety of zero-shot settings.

clip4hoi, human-object pair, practical zero-shot hoi detection, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Mining the Benefits of Two-stage and One-stage HOI Detection

Neural Information Processing SystemsDec-24-2025, 11:19:15 GMT

Two-stage methods have dominated Human-Object Interaction~(HOI) detection for several years. Recently, one-stage HOI detection methods have become popular. In this paper, we aim to explore the essential pros and cons of two-stage and one-stage methods. With this as the goal, we find that conventional two-stage methods mainly suffer from positioning positive interactive human-object pairs, while one-stage methods are challenging to make an appropriate trade-off on multi-task learning, \emph{i.e.}, object detection, and interaction classification. Therefore, a core problem is how to take the essence and discard the dregs from the conventional two types of methods.

interaction classification, name change, two-stage and one-stage hoi detection, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

8fd5bc08e744fe0dfe798c61d1575a22-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 01:22:17 GMT

detection, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 22:41:23 GMT

detection, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection

Neural Information Processing SystemsJan-19-2025, 15:01:05 GMT

Zero-shot Human-Object Interaction (HOI) detection aims to identify both seen and unseen HOI categories. A strong zero-shot HOI detector is supposed to be not only capable of discriminating novel interactions but also robust to positional distribution discrepancy between seen and unseen categories when locating human-object pairs. However, top-performing zero-shot HOI detectors rely on seen and predefined unseen categories to distill knowledge from CLIP and jointly locate human-object pairs without considering the potential positional distribution discrepancy, leading to impaired transferability. In this paper, we introduce CLIP4HOI, a novel framework for zero-shot HOI detection. CLIP4HOI is developed on the vision-language model CLIP and ameliorates the above issues in the following two aspects.

clip4hoi, human-object pair, practical zero-shot hoi detection, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Mining the Benefits of Two-stage and One-stage HOI Detection

Neural Information Processing SystemsJan-16-2025, 12:59:31 GMT

Two-stage methods have dominated Human-Object Interaction (HOI) detection for several years. Recently, one-stage HOI detection methods have become popular. In this paper, we aim to explore the essential pros and cons of two-stage and one-stage methods. With this as the goal, we find that conventional two-stage methods mainly suffer from positioning positive interactive human-object pairs, while one-stage methods are challenging to make an appropriate trade-off on multi-task learning, \emph{i.e.}, object detection, and interaction classification. Therefore, a core problem is how to take the essence and discard the dregs from the conventional two types of methods.

human-object pair, interaction classification, two-stage and one-stage hoi detection, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.82)

Add feedback

Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer

Zhang, Frederic Z., Campbell, Dylan, Gould, Stephen

arXiv.org Artificial IntelligenceDec-3-2021

Recent developments in transformer models for visual data have led to significant improvements in recognition and detection tasks. In particular, using learnable queries in place of region proposals has given rise to a new class of one-stage detection models, spearheaded by the Detection Transformer (DETR). Variations on this one-stage approach have since dominated human-object interaction (HOI) detection. However, the success of such one-stage HOI detectors can largely be attributed to the representation power of transformers. We discovered that when equipped with the same transformer, their two-stage counterparts can be more performant and memory-efficient, while taking a fraction of the time to train. In this work, we propose the Unary-Pairwise Transformer, a two-stage detector that exploits unary and pairwise representations for HOIs. We observe that the unary and pairwise parts of our transformer network specialise, with the former preferentially increasing the scores of positive examples and the latter decreasing the scores of negative examples. We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches. At inference time, our model with ResNet50 approaches real-time performance on a single GPU.

detection, detector, interaction, (15 more...)

arXiv.org Artificial Intelligence

2112.01838

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Spatio-attentive Graphs for Human-Object Interaction Detection

Zhang, Frederic Z., Campbell, Dylan, Gould, Stephen

arXiv.org Artificial IntelligenceDec-10-2020

We address the problem of detecting human--object interactions in images using graphical neural networks. Our network constructs a bipartite graph of nodes representing detected humans and objects, wherein messages passed between the nodes encode relative spatial and appearance information. Unlike existing approaches that separate appearance and spatial features, our method fuses these two cues within a single graphical model allowing information conditioned on both modalities to influence the prediction of interactions with neighboring nodes. Through extensive experimentation we demonstrate the advantages of fusing relative spatial information with appearance features in the computation of adjacency structure, message passing and the ultimate refined graph features. On the popular HICO-DET benchmark dataset, our model outperforms state-of-the-art with an mAP of 27.18, a 10% relative improvement.

detection, information, interaction, (16 more...)

arXiv.org Artificial Intelligence

2012.0606

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Collaborating Authors

human-object pair

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

8fd5bc08e744fe0dfe798c61d1575a22-Paper-Conference.pdf

MiningtheBenefitsofTwo-stageandOne-stage HOIDetection

CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection

Mining the Benefits of Two-stage and One-stage HOI Detection

8fd5bc08e744fe0dfe798c61d1575a22-Paper-Conference.pdf

8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf

CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection

Mining the Benefits of Two-stage and One-stage HOI Detection

Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer

Spatio-attentive Graphs for Human-Object Interaction Detection